WeRateDogs Analysis

1. Gather Data

In [128]:
import requests
import pandas as pd
import io
import tweepy
import json
import sqlite3
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as dates
from IPython.core.display import Image, display
import urllib.request

1.1 WeRateDogs archive

Gather data from local csv file.

In [2]:
twitter_archive_df = pd.read_csv('../data/twitter-archive-enhanced.csv')
In [3]:
twitter_archive_df.head(20)
Out[3]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href="http://twitter.com/download/iphone" r... This is Phineas. He's a mystical boy. Only eve... NaN NaN NaN https://twitter.com/dog_rates/status/892420643... 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href="http://twitter.com/download/iphone" r... This is Tilly. She's just checking pup on you.... NaN NaN NaN https://twitter.com/dog_rates/status/892177421... 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href="http://twitter.com/download/iphone" r... This is Darla. She commenced a snooze mid meal... NaN NaN NaN https://twitter.com/dog_rates/status/891689557... 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... 12 10 Franklin None None None None
5 891087950875897856 NaN NaN 2017-07-29 00:08:17 +0000 <a href="http://twitter.com/download/iphone" r... Here we have a majestic great white breaching ... NaN NaN NaN https://twitter.com/dog_rates/status/891087950... 13 10 None None None None None
6 890971913173991426 NaN NaN 2017-07-28 16:27:12 +0000 <a href="http://twitter.com/download/iphone" r... Meet Jax. He enjoys ice cream so much he gets ... NaN NaN NaN https://gofundme.com/ydvmve-surgery-for-jax,ht... 13 10 Jax None None None None
7 890729181411237888 NaN NaN 2017-07-28 00:22:40 +0000 <a href="http://twitter.com/download/iphone" r... When you watch your owner call another dog a g... NaN NaN NaN https://twitter.com/dog_rates/status/890729181... 13 10 None None None None None
8 890609185150312448 NaN NaN 2017-07-27 16:25:51 +0000 <a href="http://twitter.com/download/iphone" r... This is Zoey. She doesn't want to be one of th... NaN NaN NaN https://twitter.com/dog_rates/status/890609185... 13 10 Zoey None None None None
9 890240255349198849 NaN NaN 2017-07-26 15:59:51 +0000 <a href="http://twitter.com/download/iphone" r... This is Cassie. She is a college pup. Studying... NaN NaN NaN https://twitter.com/dog_rates/status/890240255... 14 10 Cassie doggo None None None
10 890006608113172480 NaN NaN 2017-07-26 00:31:25 +0000 <a href="http://twitter.com/download/iphone" r... This is Koda. He is a South Australian decksha... NaN NaN NaN https://twitter.com/dog_rates/status/890006608... 13 10 Koda None None None None
11 889880896479866881 NaN NaN 2017-07-25 16:11:53 +0000 <a href="http://twitter.com/download/iphone" r... This is Bruno. He is a service shark. Only get... NaN NaN NaN https://twitter.com/dog_rates/status/889880896... 13 10 Bruno None None None None
12 889665388333682689 NaN NaN 2017-07-25 01:55:32 +0000 <a href="http://twitter.com/download/iphone" r... Here's a puppo that seems to be on the fence a... NaN NaN NaN https://twitter.com/dog_rates/status/889665388... 13 10 None None None None puppo
13 889638837579907072 NaN NaN 2017-07-25 00:10:02 +0000 <a href="http://twitter.com/download/iphone" r... This is Ted. He does his best. Sometimes that'... NaN NaN NaN https://twitter.com/dog_rates/status/889638837... 12 10 Ted None None None None
14 889531135344209921 NaN NaN 2017-07-24 17:02:04 +0000 <a href="http://twitter.com/download/iphone" r... This is Stuart. He's sporting his favorite fan... NaN NaN NaN https://twitter.com/dog_rates/status/889531135... 13 10 Stuart None None None puppo
15 889278841981685760 NaN NaN 2017-07-24 00:19:32 +0000 <a href="http://twitter.com/download/iphone" r... This is Oliver. You're witnessing one of his m... NaN NaN NaN https://twitter.com/dog_rates/status/889278841... 13 10 Oliver None None None None
16 888917238123831296 NaN NaN 2017-07-23 00:22:39 +0000 <a href="http://twitter.com/download/iphone" r... This is Jim. He found a fren. Taught him how t... NaN NaN NaN https://twitter.com/dog_rates/status/888917238... 12 10 Jim None None None None
17 888804989199671297 NaN NaN 2017-07-22 16:56:37 +0000 <a href="http://twitter.com/download/iphone" r... This is Zeke. He has a new stick. Very proud o... NaN NaN NaN https://twitter.com/dog_rates/status/888804989... 13 10 Zeke None None None None
18 888554962724278272 NaN NaN 2017-07-22 00:23:06 +0000 <a href="http://twitter.com/download/iphone" r... This is Ralphus. He's powering up. Attempting ... NaN NaN NaN https://twitter.com/dog_rates/status/888554962... 13 10 Ralphus None None None None
19 888202515573088257 NaN NaN 2017-07-21 01:02:36 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: This is Canela. She attempted s... 8.874740e+17 4.196984e+09 2017-07-19 00:47:34 +0000 https://twitter.com/dog_rates/status/887473957... 13 10 Canela None None None None
In [4]:
twitter_archive_df.shape
Out[4]:
(2356, 17)

1.2 Dog breed prediction

Gather data from a URL with the requests library.

In [5]:
r = requests.get('https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv')
In [6]:
r.status_code
Out[6]:
200
In [7]:
r.headers['content-type']
Out[7]:
'text/tab-separated-values; charset=utf-8'
In [8]:
raw_data = r.content
In [9]:
prediction_df = pd.read_csv(io.StringIO(raw_data.decode('utf-8')), sep='\t')
prediction_df.tail(2)
Out[9]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False
In [157]:
# write tsv
prediction_df.to_csv('../data/image-predictions.tsv', sep='\t')

1.3 Twitter like and retweet counts

Gather data via a twitter API. Key and tokens are removed.

In [12]:
consumer_key = 'XXXXXX'
consumer_secret = 'XXXXXX'
access_token ='XXXXXX'
access_token_secret = 'XXXXXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
In [13]:
tweets = []
for tweet_id in list(twitter_archive_df['tweet_id']):
    try:
        tweet = api.get_status(tweet_id) #, tweet_mode='extended')
        tweets.append(tweet)
    except tweepy.TweepError as e:
        print(e.response.text)
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
Rate limit reached. Sleeping for: 69
{"errors":[{"code":144,"message":"No status found with that ID."}]}
{"errors":[{"code":144,"message":"No status found with that ID."}]}
In [150]:
len(tweets)
Out[150]:
2337
In [151]:
# extract json content of every tweet into a list
my_list_of_dicts = []
for each_json_tweet in tweets:
    my_list_of_dicts.append(each_json_tweet._json)
In [152]:
# write every tweet's json into a txt file
with open('../data/tweet_json.txt', 'w') as file:
        file.write(json.dumps(my_list_of_dicts, indent=4))
In [153]:
# extract the needed variables and convert it to a pandas dataframe
tweet_list = []
with open('../data/tweet_json.txt', encoding='utf-8') as json_file:  
    all_data = json.load(json_file)
    for each_dictionary in all_data:
        tweet_id = each_dictionary['id']
        text = each_dictionary['text']
        favorite_count = each_dictionary['favorite_count']
        retweet_count = each_dictionary['retweet_count']
        created_at = each_dictionary['created_at']
        tweet_list.append({'tweet_id': str(tweet_id),
                             'text': str(text),
                             'favorite_count': int(favorite_count),
                             'retweet_count': int(retweet_count),
                             'created_at': created_at,
                            })
    tweet_json = pd.DataFrame(tweet_list, columns = 
                                  ['tweet_id', 'text', 
                                   'favorite_count', 'retweet_count', 
                                   'created_at'])
In [154]:
tweet_json.shape
Out[154]:
(2337, 5)
In [248]:
tweet_json.head()
Out[248]:
tweet_id text favorite_count retweet_count created_at
0 892420643555336193 This is Phineas. He's a mystical boy. Only eve... 37588 8198 Tue Aug 01 16:23:56 +0000 2017
1 892177421306343426 This is Tilly. She's just checking pup on you.... 32306 6060 Tue Aug 01 00:17:27 +0000 2017
2 891815181378084864 This is Archie. He is a rare Norwegian Pouncin... 24348 4009 Mon Jul 31 00:18:03 +0000 2017
3 891689557279858688 This is Darla. She commenced a snooze mid meal... 40932 8357 Sun Jul 30 15:58:51 +0000 2017
4 891327558926688256 This is Franklin. He would like you to stop ca... 39127 9050 Sat Jul 29 16:00:24 +0000 2017
In [21]:
# write dataframe to a csv file
tweet_json.to_csv('../data/tweet_dogs.csv', index=False)
In [22]:
# read tweet_dogs.csv (to avoid accessing the API every time)
tweets_df = pd.read_csv('../data/tweet_dogs.csv')
tweets_df.head(20)
Out[22]:
tweet_id text favorite_count retweet_count created_at
0 892420643555336193 This is Phineas. He's a mystical boy. Only eve... 37588 8198 Tue Aug 01 16:23:56 +0000 2017
1 892177421306343426 This is Tilly. She's just checking pup on you.... 32306 6060 Tue Aug 01 00:17:27 +0000 2017
2 891815181378084864 This is Archie. He is a rare Norwegian Pouncin... 24348 4009 Mon Jul 31 00:18:03 +0000 2017
3 891689557279858688 This is Darla. She commenced a snooze mid meal... 40932 8357 Sun Jul 30 15:58:51 +0000 2017
4 891327558926688256 This is Franklin. He would like you to stop ca... 39127 9050 Sat Jul 29 16:00:24 +0000 2017
5 891087950875897856 Here we have a majestic great white breaching ... 19681 3005 Sat Jul 29 00:08:17 +0000 2017
6 890971913173991426 Meet Jax. He enjoys ice cream so much he gets ... 11503 1985 Fri Jul 28 16:27:12 +0000 2017
7 890729181411237888 When you watch your owner call another dog a g... 63431 18212 Fri Jul 28 00:22:40 +0000 2017
8 890609185150312448 This is Zoey. She doesn't want to be one of th... 27059 4122 Thu Jul 27 16:25:51 +0000 2017
9 890240255349198849 This is Cassie. She is a college pup. Studying... 31004 7120 Wed Jul 26 15:59:51 +0000 2017
10 890006608113172480 This is Koda. He is a South Australian decksha... 29817 7078 Wed Jul 26 00:31:25 +0000 2017
11 889880896479866881 This is Bruno. He is a service shark. Only get... 27032 4810 Tue Jul 25 16:11:53 +0000 2017
12 889665388333682689 Here's a puppo that seems to be on the fence a... 46748 9693 Tue Jul 25 01:55:32 +0000 2017
13 889638837579907072 This is Ted. He does his best. Sometimes that'... 26357 4377 Tue Jul 25 00:10:02 +0000 2017
14 889531135344209921 This is Stuart. He's sporting his favorite fan... 14698 2173 Mon Jul 24 17:02:04 +0000 2017
15 889278841981685760 This is Oliver. You're witnessing one of his m... 24543 5197 Mon Jul 24 00:19:32 +0000 2017
16 888917238123831296 This is Jim. He found a fren. Taught him how t... 28317 4348 Sun Jul 23 00:22:39 +0000 2017
17 888804989199671297 This is Zeke. He has a new stick. Very proud o... 24836 4132 Sat Jul 22 16:56:37 +0000 2017
18 888554962724278272 This is Ralphus. He's powering up. Attempting ... 19274 3415 Sat Jul 22 00:23:06 +0000 2017
19 888078434458587136 This is Gerald. He was just told he didn't get... 21153 3363 Thu Jul 20 16:49:33 +0000 2017

2. Tidying + Cleaning

Cleaning

  • convert timestamp and created_at columns to datetime
  • convert doggo, floofer, pupper and puppo columns to bool
  • remove retreats (tweets without images)
  • reomving tweets without dog name
  • remove tweets with retweet id
  • remove rows where rating_denominator != 10
  • remove when p1_dog (and p2 ?) is False in third df
  • remove newer than August 1st, 2017.

Tidying

  • inner join on tweet_id (keep only tweets where data is available in all 3 dfs)
  • take ranking from first df
  • take name in from first df
  • extract image from second df
  • take retreat count from third df
  • take favorite count from third df
  • take jpg_url from second df
  • take all 3 predictions and confidence values from third df
  • remove needless columns

2.1 Cleaning

The cleaning step is chosen first, to see more dirty data in single dataframes before joining all three dataframes. Usually Tidying would be the first step.

In [23]:
# copy every dataframe
twitter_archive_df_cleaned = twitter_archive_df.copy()
prediction_df_cleaned = prediction_df.copy()
tweets_df_cleaned = tweets_df.copy()

Get info about data types of every dataframe.

In [24]:
twitter_archive_df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB
In [25]:
prediction_df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB
In [26]:
tweets_df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2337 entries, 0 to 2336
Data columns (total 5 columns):
tweet_id          2337 non-null int64
text              2337 non-null object
favorite_count    2337 non-null int64
retweet_count     2337 non-null int64
created_at        2337 non-null object
dtypes: int64(3), object(2)
memory usage: 91.4+ KB

Check if every tweet id is unique and there are no duplicates.

In [249]:
twitter_archive_df_cleaned['tweet_id'].nunique()
twitter_archive_df_cleaned['tweet_id'].duplicated().any()
Out[249]:
False

Convert to datetime:

In [28]:
# convert date columns to datetime
twitter_archive_df_cleaned['timestamp'] = pd.to_datetime(twitter_archive_df_cleaned['timestamp'])
tweets_df_cleaned['created_at'] = pd.to_datetime(tweets_df_cleaned['created_at'])
print(type(twitter_archive_df_cleaned['timestamp'][0]), type(tweets_df_cleaned['created_at'][0]))
<class 'pandas._libs.tslibs.timestamps.Timestamp'> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

Make data type of doggo, floofer, pupper and puppo columns bool values:

In [29]:
# doggo, floofer, pupper and puppo columns to bool
#twitter_archive_df_cleaned['puppo'].unique() # to explore
twitter_archive_df_cleaned['doggo'].replace({'doggo': True, 'None': False}, inplace=True)
twitter_archive_df_cleaned['floofer'].replace({'floofer': True, 'None': False}, inplace=True)
twitter_archive_df_cleaned['pupper'].replace({'pupper': True, 'None': False}, inplace=True)
twitter_archive_df_cleaned['puppo'].replace({'puppo': True, 'None': False}, inplace=True)

print(twitter_archive_df_cleaned['doggo'].unique()) # to test
print(twitter_archive_df_cleaned['floofer'].unique()) # to test
print(twitter_archive_df_cleaned['pupper'].unique()) # to test
print(twitter_archive_df_cleaned['puppo'].unique()) # to test
[False  True]
[False  True]
[False  True]
[False  True]

Tweets without image url are no original rating. Remove them:

In [30]:
# tweets without image url are no original rating
#print(twitter_archive_df[twitter_archive_df['expanded_urls'].isnull()]) # for exploration
rows_before = twitter_archive_df_cleaned.shape[0]
twitter_archive_df_cleaned = twitter_archive_df_cleaned.dropna(axis=0, subset=['expanded_urls'])
print("Dropped ", rows_before-twitter_archive_df_cleaned.shape[0], " due to non-original rating tweets.")
Dropped  59  due to non-original rating tweets.

Rating tweets without dog name are not wanted for this analysis later. Remove them:

In [31]:
# rating tweets without dog name
#twitter_archive_df[twitter_archive_df['name'] == "None"].shape[0] # to explore
rows_before = twitter_archive_df_cleaned.shape[0]
twitter_archive_df_cleaned = twitter_archive_df_cleaned[twitter_archive_df_cleaned['name'] != "None"]
print("Dropped ", rows_before-twitter_archive_df_cleaned.shape[0], " due to unknown dog name.")
Dropped  686  due to unknown dog name.

Remove tweets with retweeted_status_id, because they are retweet-tweets and no original rating tweets.

In [32]:
# remove tweets with retweeted_status_id
#twitter_archive_df[~twitter_archive_df['retweeted_status_id'].isnull()] # for exploration
rows_before = twitter_archive_df_cleaned.shape[0]
twitter_archive_df_cleaned = twitter_archive_df_cleaned[twitter_archive_df_cleaned['retweeted_status_id'].isnull()]
print("Dropped ", rows_before-twitter_archive_df_cleaned.shape[0], " due to re-tweets.")
Dropped  116  due to re-tweets.

If the denominator is not 10, then it is no valid rating. This rows will be removed as well:

In [33]:
# no valid rating
#twitter_archive_df_cleaned[twitter_archive_df_cleaned.rating_denominator!= 10] # for exploration
rows_before = twitter_archive_df_cleaned.shape[0]
twitter_archive_df_cleaned = twitter_archive_df_cleaned[twitter_archive_df_cleaned.rating_denominator== 10] 
print("Dropped ", rows_before-twitter_archive_df_cleaned.shape[0], " due to invalid rating denominator.")
Dropped  6  due to invalid rating denominator.

If the neural network can not predict that it is a dog from the image, then it will be removed as well:

In [34]:
#prediction_df_cleaned[prediction_df.p1_dog == False] # for exploration
rows_before = prediction_df_cleaned.shape[0]
prediction_df_cleaned = prediction_df_cleaned[prediction_df.p1_dog == True]
print("Dropped ", rows_before-prediction_df_cleaned.shape[0], " due to no-dog prediction.")
Dropped  543  due to no-dog prediction.

Remove all tweets after August 1st 2017:

In [35]:
# drop all newer than August 1st 2017
#tweets_df_cleaned[tweets_df_cleaned.created_at > '2017-08-01'] # to explore
rows_before = tweets_df_cleaned.shape[0]
tweets_df_cleaned = tweets_df_cleaned[tweets_df_cleaned.created_at < '2017-08-01']
print("Dropped ", rows_before-tweets_df_cleaned.shape[0], " due to date limit")
rows_before = twitter_archive_df_cleaned.shape[0]
twitter_archive_df_cleaned = twitter_archive_df_cleaned[twitter_archive_df_cleaned.timestamp < '2017-08-01']
print("Dropped ", rows_before-twitter_archive_df_cleaned.shape[0], " due to date limit")
Dropped  2  due to date limit
Dropped  2  due to date limit

2.2 Tidying

In [36]:
print("rows of twitter_archive_df_cleaned: ", twitter_archive_df_cleaned.shape[0])
print("rows of prediction_df_cleaned: ", prediction_df_cleaned.shape[0])
print("rows of tweets_df_cleaned: ", tweets_df_cleaned.shape[0])
rows of twitter_archive_df_cleaned:  1487
rows of prediction_df_cleaned:  1532
rows of tweets_df_cleaned:  2335
In [37]:
print("columns of twitter_archive_df_cleaned: ", twitter_archive_df_cleaned.shape[1])
print("columns of prediction_df_cleaned: ", prediction_df_cleaned.shape[1])
print("columns of tweets_df_cleaned: ", tweets_df_cleaned.shape[1])
columns of twitter_archive_df_cleaned:  17
columns of prediction_df_cleaned:  12
columns of tweets_df_cleaned:  5

Merge all dataframes:

In [38]:
# join twitter_archive_df_cleaned and prediction_df_cleaned
joined_df = twitter_archive_df_cleaned.join(prediction_df_cleaned.set_index('tweet_id'), on='tweet_id', how='inner')
In [39]:
joined_df.shape
Out[39]:
(1109, 28)
In [40]:
# join also with tweets_df_cleaned
joined_df = joined_df.join(tweets_df_cleaned.set_index('tweet_id'), on='tweet_id', how='inner', lsuffix='', rsuffix='_otweet')
In [41]:
joined_df.shape
Out[41]:
(1105, 32)
In [42]:
joined_df.head(2)
Out[42]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls ... p2 p2_conf p2_dog p3 p3_conf p3_dog text_otweet favorite_count retweet_count created_at
2 891815181378084864 NaN NaN 2017-07-31 00:18:03+00:00 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... ... malamute 0.078253 True kelpie 0.031379 True This is Archie. He is a rare Norwegian Pouncin... 24348 4009 2017-07-31 00:18:03+00:00
4 891327558926688256 NaN NaN 2017-07-29 16:00:24+00:00 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... ... English_springer 0.225770 True German_short-haired_pointer 0.175219 True This is Franklin. He would like you to stop ca... 39127 9050 2017-07-29 16:00:24+00:00

2 rows × 32 columns

Check if the duplicated column for timestamps have the same content:

In [43]:
# check if timestamps are all the same after joining
joined_df[joined_df['timestamp'] == joined_df['created_at']]
Out[43]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls ... p2 p2_conf p2_dog p3 p3_conf p3_dog text_otweet favorite_count retweet_count created_at
2 891815181378084864 NaN NaN 2017-07-31 00:18:03+00:00 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... ... malamute 0.078253 True kelpie 0.031379 True This is Archie. He is a rare Norwegian Pouncin... 24348 4009 2017-07-31 00:18:03+00:00
4 891327558926688256 NaN NaN 2017-07-29 16:00:24+00:00 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... ... English_springer 0.225770 True German_short-haired_pointer 0.175219 True This is Franklin. He would like you to stop ca... 39127 9050 2017-07-29 16:00:24+00:00
6 890971913173991426 NaN NaN 2017-07-28 16:27:12+00:00 <a href="http://twitter.com/download/iphone" r... Meet Jax. He enjoys ice cream so much he gets ... NaN NaN NaN https://gofundme.com/ydvmve-surgery-for-jax,ht... ... Border_collie 0.199287 True ice_lolly 0.193548 False Meet Jax. He enjoys ice cream so much he gets ... 11503 1985 2017-07-28 16:27:12+00:00
8 890609185150312448 NaN NaN 2017-07-27 16:25:51+00:00 <a href="http://twitter.com/download/iphone" r... This is Zoey. She doesn't want to be one of th... NaN NaN NaN https://twitter.com/dog_rates/status/890609185... ... Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True This is Zoey. She doesn't want to be one of th... 27059 4122 2017-07-27 16:25:51+00:00
9 890240255349198849 NaN NaN 2017-07-26 15:59:51+00:00 <a href="http://twitter.com/download/iphone" r... This is Cassie. She is a college pup. Studying... NaN NaN NaN https://twitter.com/dog_rates/status/890240255... ... Cardigan 0.451038 True Chihuahua 0.029248 True This is Cassie. She is a college pup. Studying... 31004 7120 2017-07-26 15:59:51+00:00
10 890006608113172480 NaN NaN 2017-07-26 00:31:25+00:00 <a href="http://twitter.com/download/iphone" r... This is Koda. He is a South Australian decksha... NaN NaN NaN https://twitter.com/dog_rates/status/890006608... ... Pomeranian 0.013884 True chow 0.008167 True This is Koda. He is a South Australian decksha... 29817 7078 2017-07-26 00:31:25+00:00
11 889880896479866881 NaN NaN 2017-07-25 16:11:53+00:00 <a href="http://twitter.com/download/iphone" r... This is Bruno. He is a service shark. Only get... NaN NaN NaN https://twitter.com/dog_rates/status/889880896... ... Labrador_retriever 0.151317 True muzzle 0.082981 False This is Bruno. He is a service shark. Only get... 27032 4810 2017-07-25 16:11:53+00:00
13 889638837579907072 NaN NaN 2017-07-25 00:10:02+00:00 <a href="http://twitter.com/download/iphone" r... This is Ted. He does his best. Sometimes that'... NaN NaN NaN https://twitter.com/dog_rates/status/889638837... ... boxer 0.002129 True Staffordshire_bullterrier 0.001498 True This is Ted. He does his best. Sometimes that'... 26357 4377 2017-07-25 00:10:02+00:00
14 889531135344209921 NaN NaN 2017-07-24 17:02:04+00:00 <a href="http://twitter.com/download/iphone" r... This is Stuart. He's sporting his favorite fan... NaN NaN NaN https://twitter.com/dog_rates/status/889531135... ... Labrador_retriever 0.013834 True redbone 0.007958 True This is Stuart. He's sporting his favorite fan... 14698 2173 2017-07-24 17:02:04+00:00
15 889278841981685760 NaN NaN 2017-07-24 00:19:32+00:00 <a href="http://twitter.com/download/iphone" r... This is Oliver. You're witnessing one of his m... NaN NaN NaN https://twitter.com/dog_rates/status/889278841... ... borzoi 0.194742 True Saluki 0.027351 True This is Oliver. You're witnessing one of his m... 24543 5197 2017-07-24 00:19:32+00:00
16 888917238123831296 NaN NaN 2017-07-23 00:22:39+00:00 <a href="http://twitter.com/download/iphone" r... This is Jim. He found a fren. Taught him how t... NaN NaN NaN https://twitter.com/dog_rates/status/888917238... ... Tibetan_mastiff 0.120184 True Labrador_retriever 0.105506 True This is Jim. He found a fren. Taught him how t... 28317 4348 2017-07-23 00:22:39+00:00
17 888804989199671297 NaN NaN 2017-07-22 16:56:37+00:00 <a href="http://twitter.com/download/iphone" r... This is Zeke. He has a new stick. Very proud o... NaN NaN NaN https://twitter.com/dog_rates/status/888804989... ... Labrador_retriever 0.184172 True English_setter 0.073482 True This is Zeke. He has a new stick. Very proud o... 24836 4132 2017-07-22 16:56:37+00:00
18 888554962724278272 NaN NaN 2017-07-22 00:23:06+00:00 <a href="http://twitter.com/download/iphone" r... This is Ralphus. He's powering up. Attempting ... NaN NaN NaN https://twitter.com/dog_rates/status/888554962... ... Eskimo_dog 0.166511 True malamute 0.111411 True This is Ralphus. He's powering up. Attempting ... 19274 3415 2017-07-22 00:23:06+00:00
20 888078434458587136 NaN NaN 2017-07-20 16:49:33+00:00 <a href="http://twitter.com/download/iphone" r... This is Gerald. He was just told he didn't get... NaN NaN NaN https://twitter.com/dog_rates/status/888078434... ... pug 0.000932 True bull_mastiff 0.000903 True This is Gerald. He was just told he didn't get... 21153 3363 2017-07-20 16:49:33+00:00
21 887705289381826560 NaN NaN 2017-07-19 16:06:48+00:00 <a href="http://twitter.com/download/iphone" r... This is Jeffrey. He has a monopoly on the pool... NaN NaN NaN https://twitter.com/dog_rates/status/887705289... ... redbone 0.087582 True Weimaraner 0.026236 True This is Jeffrey. He has a monopoly on the pool... 29364 5193 2017-07-19 16:06:48+00:00
23 887473957103951883 NaN NaN 2017-07-19 00:47:34+00:00 <a href="http://twitter.com/download/iphone" r... This is Canela. She attempted some fancy porch... NaN NaN NaN https://twitter.com/dog_rates/status/887473957... ... Rhodesian_ridgeback 0.054950 True beagle 0.038915 True This is Canela. She attempted some fancy porch... 67002 17535 2017-07-19 00:47:34+00:00
26 886983233522544640 NaN NaN 2017-07-17 16:17:36+00:00 <a href="http://twitter.com/download/iphone" r... This is Maya. She's very shy. Rarely leaves he... NaN NaN NaN https://twitter.com/dog_rates/status/886983233... ... toy_terrier 0.143528 True can_opener 0.032253 False This is Maya. She's very shy. Rarely leaves he... 34141 7500 2017-07-17 16:17:36+00:00
27 886736880519319552 NaN NaN 2017-07-16 23:58:41+00:00 <a href="http://twitter.com/download/iphone" r... This is Mingus. He's a wonderful father to his... NaN NaN NaN https://www.gofundme.com/mingusneedsus,https:/... ... Great_Pyrenees 0.186136 True Dandie_Dinmont 0.086346 True This is Mingus. He's a wonderful father to his... 11703 3152 2017-07-16 23:58:41+00:00
29 886366144734445568 NaN NaN 2017-07-15 23:25:31+00:00 <a href="http://twitter.com/download/iphone" r... This is Roscoe. Another pupper fallen victim t... NaN NaN NaN https://twitter.com/dog_rates/status/886366144... ... Chihuahua 0.000361 True Boston_bull 0.000076 True This is Roscoe. Another pupper fallen victim t... 20591 3090 2017-07-15 23:25:31+00:00
31 886258384151887873 NaN NaN 2017-07-15 16:17:19+00:00 <a href="http://twitter.com/download/iphone" r... This is Waffles. His doggles are pupside down.... NaN NaN NaN https://twitter.com/dog_rates/status/886258384... ... shower_cap 0.025286 False Siamese_cat 0.002849 False This is Waffles. His doggles are pupside down.... 27219 6091 2017-07-15 16:17:19+00:00
33 885984800019947520 NaN NaN 2017-07-14 22:10:11+00:00 <a href="http://twitter.com/download/iphone" r... Viewer discretion advised. This is Jimbo. He w... NaN NaN NaN https://twitter.com/dog_rates/status/885984800... ... Shih-Tzu 0.006630 True Bernese_mountain_dog 0.006239 True Viewer discretion advised. This is Jimbo. He w... 31687 6549 2017-07-14 22:10:11+00:00
34 885528943205470208 NaN NaN 2017-07-13 15:58:47+00:00 <a href="http://twitter.com/download/iphone" r... This is Maisey. She fell asleep mid-excavation... NaN NaN NaN https://twitter.com/dog_rates/status/885528943... ... Labrador_retriever 0.265835 True kuvasz 0.134697 True This is Maisey. She fell asleep mid-excavation... 34974 6215 2017-07-13 15:58:47+00:00
38 884925521741709313 NaN NaN 2017-07-12 00:01:00+00:00 <a href="http://twitter.com/download/iphone" r... This is Earl. He found a hat. Nervous about wh... NaN NaN NaN https://twitter.com/dog_rates/status/884925521... ... American_Staffordshire_terrier 0.198451 True Staffordshire_bullterrier 0.127725 True This is Earl. He found a hat. Nervous about wh... 75166 17618 2017-07-12 00:01:00+00:00
39 884876753390489601 NaN NaN 2017-07-11 20:47:12+00:00 <a href="http://twitter.com/download/iphone" r... This is Lola. It's her first time outside. Mus... NaN NaN NaN https://twitter.com/dog_rates/status/884876753... ... Norwich_terrier 0.106075 True Norfolk_terrier 0.037348 True This is Lola. It's her first time outside. Mus... 27116 5427 2017-07-11 20:47:12+00:00
40 884562892145688576 NaN NaN 2017-07-11 00:00:02+00:00 <a href="http://twitter.com/download/iphone" r... This is Kevin. He's just so happy. 13/10 what ... NaN NaN NaN https://twitter.com/dog_rates/status/884562892... ... French_bulldog 0.404291 True Brabancon_griffon 0.044002 True This is Kevin. He's just so happy. 13/10 what ... 23608 4526 2017-07-11 00:00:02+00:00
43 884162670584377345 NaN NaN 2017-07-09 21:29:42+00:00 <a href="http://twitter.com/download/iphone" r... Meet Yogi. He doesn't have any important dog m... NaN NaN NaN https://twitter.com/dog_rates/status/884162670... ... malinois 0.199396 True Norwegian_elkhound 0.049148 True Meet Yogi. He doesn't have any important dog m... 19790 2882 2017-07-09 21:29:42+00:00
44 883838122936631299 NaN NaN 2017-07-09 00:00:04+00:00 <a href="http://twitter.com/download/iphone" r... This is Noah. He can't believe someone made th... NaN NaN NaN https://twitter.com/dog_rates/status/883838122... ... miniature_pinscher 0.299603 True kelpie 0.063020 True This is Noah. He can't believe someone made th... 21273 3331 2017-07-09 00:00:04+00:00
45 883482846933004288 NaN NaN 2017-07-08 00:28:19+00:00 <a href="http://twitter.com/download/iphone" r... This is Bella. She hopes her smile made you sm... NaN NaN NaN https://twitter.com/dog_rates/status/883482846... ... Labrador_retriever 0.032409 True kuvasz 0.005501 True This is Bella. She hopes her smile made you sm... 44625 9594 2017-07-08 00:28:19+00:00
46 883360690899218434 NaN NaN 2017-07-07 16:22:55+00:00 <a href="http://twitter.com/download/iphone" r... Meet Grizzwald. He may be the floofiest floofe... NaN NaN NaN https://twitter.com/dog_rates/status/883360690... ... Tibetan_mastiff 0.007099 True Newfoundland 0.002140 True Meet Grizzwald. He may be the floofiest floofe... 22046 3586 2017-07-07 16:22:55+00:00
48 882992080364220416 NaN NaN 2017-07-06 15:58:11+00:00 <a href="http://twitter.com/download/iphone" r... This is Rusty. He wasn't ready for the first p... NaN NaN NaN https://twitter.com/dog_rates/status/882992080... ... Siberian_husky 0.406044 True dingo 0.073414 False This is Rusty. He wasn't ready for the first p... 23263 3795 2017-07-06 15:58:11+00:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2283 667200525029539841 NaN NaN 2015-11-19 04:39:35+00:00 <a href="http://twitter.com/download/iphone" r... This is Joshwa. He is a fuckboy supreme. He cl... NaN NaN NaN https://twitter.com/dog_rates/status/667200525... ... malamute 0.232006 True Eskimo_dog 0.050635 True This is Joshwa. He is a fuckboy supreme. He cl... 607 260 2015-11-19 04:39:35+00:00
2286 667182792070062081 NaN NaN 2015-11-19 03:29:07+00:00 <a href="http://twitter.com/download/iphone" r... This is Timison. He just told an awful joke bu... NaN NaN NaN https://twitter.com/dog_rates/status/667182792... ... Irish_setter 0.010564 True Chesapeake_Bay_retriever 0.005821 True This is Timison. He just told an awful joke bu... 14245 6185 2015-11-19 03:29:07+00:00
2287 667177989038297088 NaN NaN 2015-11-19 03:10:02+00:00 <a href="http://twitter.com/download/iphone" r... This is a Dasani Kingfisher from Maine. His na... NaN NaN NaN https://twitter.com/dog_rates/status/667177989... ... Chesapeake_Bay_retriever 0.176293 True Weimaraner 0.112369 True This is a Dasani Kingfisher from Maine. His na... 190 54 2015-11-19 03:10:02+00:00
2289 667174963120574464 NaN NaN 2015-11-19 02:58:01+00:00 <a href="http://twitter.com/download/iphone" r... This is Clarence. His face says he doesn't wan... NaN NaN NaN https://twitter.com/dog_rates/status/667174963... ... Chihuahua 0.243223 True bluetick 0.072806 True This is Clarence. His face says he doesn't wan... 239 82 2015-11-19 02:58:01+00:00
2290 667171260800061440 NaN NaN 2015-11-19 02:43:18+00:00 <a href="http://twitter.com/download/iphone" r... Say hello to Kenneth. He likes Reese's Puffs. ... NaN NaN NaN https://twitter.com/dog_rates/status/667171260... ... Lakeland_terrier 0.052744 True Irish_water_spaniel 0.034402 True Say hello to Kenneth. He likes Reese's Puffs. ... 219 88 2015-11-19 02:43:18+00:00
2291 667165590075940865 NaN NaN 2015-11-19 02:20:46+00:00 <a href="http://twitter.com/download/iphone" r... This is Churlie. AKA Fetty Woof. Lost eye savi... NaN NaN NaN https://twitter.com/dog_rates/status/667165590... ... Rottweiler 0.134094 True beagle 0.081900 True This is Churlie. AKA Fetty Woof. Lost eye savi... 2669 1134 2015-11-19 02:20:46+00:00
2292 667160273090932737 NaN NaN 2015-11-19 01:59:39+00:00 <a href="http://twitter.com/download/iphone" r... This is Bradlay. He is a Ronaldinho Matsuyama ... NaN NaN NaN https://twitter.com/dog_rates/status/667160273... ... miniature_poodle 0.091992 True standard_poodle 0.087385 True This is Bradlay. He is a Ronaldinho Matsuyama ... 254 62 2015-11-19 01:59:39+00:00
2293 667152164079423490 NaN NaN 2015-11-19 01:27:25+00:00 <a href="http://twitter.com/download/iphone" r... This is Pipsy. He is a fluffball. Enjoys trave... NaN NaN NaN https://twitter.com/dog_rates/status/667152164... ... Pomeranian 0.087544 True miniature_poodle 0.062050 True This is Pipsy. He is a fluffball. Enjoys trave... 47349 16988 2015-11-19 01:27:25+00:00
2295 667119796878725120 NaN NaN 2015-11-18 23:18:48+00:00 <a href="http://twitter.com/download/iphone" r... This is Gabe. He is a southern Baklava. Gabe h... NaN NaN NaN https://twitter.com/dog_rates/status/667119796... ... Chihuahua 0.057866 True toy_poodle 0.039125 True This is Gabe. He is a southern Baklava. Gabe h... 327 127 2015-11-18 23:18:48+00:00
2296 667090893657276420 NaN NaN 2015-11-18 21:23:57+00:00 <a href="http://twitter.com/download/iphone" r... This is Clybe. He is an Anemone Valdez. One ea... NaN NaN NaN https://twitter.com/dog_rates/status/667090893... ... Italian_greyhound 0.005370 True Pomeranian 0.002641 True This is Clybe. He is an Anemone Valdez. One ea... 328 126 2015-11-18 21:23:57+00:00
2297 667073648344346624 NaN NaN 2015-11-18 20:15:26+00:00 <a href="http://twitter.com/download/iphone" r... Here is Dave. He is actually just a skinny leg... NaN NaN NaN https://twitter.com/dog_rates/status/667073648... ... pug 0.092494 True Brabancon_griffon 0.057495 True Here is Dave. He is actually just a skinny leg... 398 122 2015-11-18 20:15:26+00:00
2300 667062181243039745 NaN NaN 2015-11-18 19:29:52+00:00 <a href="http://twitter.com/download/iphone" r... This is Keet. He is a Floridian Amukamara. Abs... NaN NaN NaN https://twitter.com/dog_rates/status/667062181... ... vizsla 0.090998 True kelpie 0.022956 True This is Keet. He is a Floridian Amukamara. Abs... 215 54 2015-11-18 19:29:52+00:00
2308 666817836334096384 NaN NaN 2015-11-18 03:18:55+00:00 <a href="http://twitter.com/download/iphone" r... This is Jeph. He is a German Boston Shuttlecoc... NaN NaN NaN https://twitter.com/dog_rates/status/666817836... ... standard_schnauzer 0.285276 True giant_schnauzer 0.073764 True This is Jeph. He is a German Boston Shuttlecoc... 512 248 2015-11-18 03:18:55+00:00
2309 666804364988780544 NaN NaN 2015-11-18 02:25:23+00:00 <a href="http://twitter.com/download/iphone" r... This is Jockson. He is a Pinnacle Sagittarius.... NaN NaN NaN https://twitter.com/dog_rates/status/666804364... ... Brittany_spaniel 0.283545 True Ibizan_hound 0.057461 True This is Jockson. He is a Pinnacle Sagittarius.... 236 92 2015-11-18 02:25:23+00:00
2311 666781792255496192 NaN NaN 2015-11-18 00:55:42+00:00 <a href="http://twitter.com/download/iphone" r... This is a purebred Bacardi named Octaviath. Ca... NaN NaN NaN https://twitter.com/dog_rates/status/666781792... ... Weimaraner 0.151363 True vizsla 0.085989 True This is a purebred Bacardi named Octaviath. Ca... 380 189 2015-11-18 00:55:42+00:00
2313 666739327293083650 NaN NaN 2015-11-17 22:06:57+00:00 <a href="http://twitter.com/download/iphone" r... This is Lugan. He is a Bohemian Rhapsody. Very... NaN NaN NaN https://twitter.com/dog_rates/status/666739327... ... cocker_spaniel 0.165255 True toy_poodle 0.095959 True This is Lugan. He is a Bohemian Rhapsody. Very... 232 65 2015-11-17 22:06:57+00:00
2314 666701168228331520 NaN NaN 2015-11-17 19:35:19+00:00 <a href="http://twitter.com/download/iphone" r... This is a golden Buckminsterfullerene named Jo... NaN NaN NaN https://twitter.com/dog_rates/status/666701168... ... Chihuahua 0.029307 True French_bulldog 0.020756 True This is a golden Buckminsterfullerene named Jo... 425 216 2015-11-17 19:35:19+00:00
2315 666691418707132416 NaN NaN 2015-11-17 18:56:35+00:00 <a href="http://twitter.com/download/iphone" r... This is Christoper. He is a spotted Penne. Can... NaN NaN NaN https://twitter.com/dog_rates/status/666691418... ... beagle 0.008687 True bloodhound 0.005394 True This is Christoper. He is a spotted Penne. Can... 186 47 2015-11-17 18:56:35+00:00
2317 666644823164719104 NaN NaN 2015-11-17 15:51:26+00:00 <a href="http://twitter.com/download/iphone" r... This is Jimothy. He is a Botwanian Gouda. Can ... NaN NaN NaN https://twitter.com/dog_rates/status/666644823... ... Pembroke 0.043209 True West_Highland_white_terrier 0.038906 True This is Jimothy. He is a Botwanian Gouda. Can ... 228 79 2015-11-17 15:51:26+00:00
2318 666454714377183233 NaN NaN 2015-11-17 03:16:00+00:00 <a href="http://twitter.com/download/iphone" r... I'll name the dogs from now on. This is Kreggo... NaN NaN NaN https://twitter.com/dog_rates/status/666454714... ... Labrador_retriever 0.237612 True Great_Pyrenees 0.171106 True I'll name the dogs from now on. This is Kreggo... 510 207 2015-11-17 03:16:00+00:00
2319 666447344410484738 NaN NaN 2015-11-17 02:46:43+00:00 <a href="http://twitter.com/download/iphone" r... This is Scout. She is a black Downton Abbey. I... NaN NaN NaN https://twitter.com/dog_rates/status/666447344... ... giant_schnauzer 0.287955 True Labrador_retriever 0.166331 True This is Scout. She is a black Downton Abbey. I... 101 19 2015-11-17 02:46:43+00:00
2325 666418789513326592 NaN NaN 2015-11-17 00:53:15+00:00 <a href="http://twitter.com/download/iphone" r... This is Walter. He is an Alaskan Terrapin. Lov... NaN NaN NaN https://twitter.com/dog_rates/status/666418789... ... papillon 0.148258 True Chihuahua 0.142860 True This is Walter. He is an Alaskan Terrapin. Lov... 121 45 2015-11-17 00:53:15+00:00
2327 666407126856765440 NaN NaN 2015-11-17 00:06:54+00:00 <a href="http://twitter.com/download/iphone" r... This is a southern Vesuvius bumblegruff. Can d... NaN NaN NaN https://twitter.com/dog_rates/status/666407126... ... bloodhound 0.244220 True flat-coated_retriever 0.173810 True This is a southern Vesuvius bumblegruff. Can d... 105 38 2015-11-17 00:06:54+00:00
2345 666063827256086533 NaN NaN 2015-11-16 01:22:45+00:00 <a href="http://twitter.com/download/iphone" r... This is the happiest dog you will ever see. Ve... NaN NaN NaN https://twitter.com/dog_rates/status/666063827... ... Tibetan_mastiff 0.093718 True Labrador_retriever 0.072427 True This is the happiest dog you will ever see. Ve... 461 212 2015-11-16 01:22:45+00:00
2346 666058600524156928 NaN NaN 2015-11-16 01:01:59+00:00 <a href="http://twitter.com/download/iphone" r... Here is the Rand Paul of retrievers folks! He'... NaN NaN NaN https://twitter.com/dog_rates/status/666058600... ... komondor 0.192305 True soft-coated_wheaten_terrier 0.082086 True Here is the Rand Paul of retrievers folks! He'... 109 57 2015-11-16 01:01:59+00:00
2348 666055525042405380 NaN NaN 2015-11-16 00:49:46+00:00 <a href="http://twitter.com/download/iphone" r... Here is a Siberian heavily armored polar bear ... NaN NaN NaN https://twitter.com/dog_rates/status/666055525... ... Tibetan_mastiff 0.058279 True fur_coat 0.054449 False Here is a Siberian heavily armored polar bear ... 426 235 2015-11-16 00:49:46+00:00
2350 666050758794694657 NaN NaN 2015-11-16 00:30:50+00:00 <a href="http://twitter.com/download/iphone" r... This is a truly beautiful English Wilson Staff... NaN NaN NaN https://twitter.com/dog_rates/status/666050758... ... English_springer 0.263788 True Greater_Swiss_Mountain_dog 0.016199 True This is a truly beautiful English Wilson Staff... 130 57 2015-11-16 00:30:50+00:00
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52+00:00 <a href="http://twitter.com/download/iphone" r... This is a purebred Piers Morgan. Loves to Netf... NaN NaN NaN https://twitter.com/dog_rates/status/666044226... ... redbone 0.360687 True miniature_pinscher 0.222752 True This is a purebred Piers Morgan. Loves to Netf... 289 136 2015-11-16 00:04:52+00:00
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54+00:00 <a href="http://twitter.com/download/iphone" r... Here is a very happy pup. Big fan of well-main... NaN NaN NaN https://twitter.com/dog_rates/status/666033412... ... malinois 0.138584 True bloodhound 0.116197 True Here is a very happy pup. Big fan of well-main... 121 43 2015-11-15 23:21:54+00:00
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30+00:00 <a href="http://twitter.com/download/iphone" r... This is a western brown Mitsubishi terrier. Up... NaN NaN NaN https://twitter.com/dog_rates/status/666029285... ... miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True This is a western brown Mitsubishi terrier. Up... 125 46 2015-11-15 23:05:30+00:00

1105 rows × 32 columns

In [44]:
#joined_df[joined_df['jpg_url'] == joined_df['expanded_urls']] # for exploration

Remove all columns that are duplicates or that are not needed for later analysis:

In [45]:
# remove needless columns
joined_df.drop(columns=['created_at', 'in_reply_to_status_id', 'in_reply_to_user_id', 'source', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp', 'text_otweet', 'expanded_urls'], inplace=True)
In [46]:
joined_df
Out[46]:
tweet_id timestamp text rating_numerator rating_denominator name doggo floofer pupper puppo ... p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog favorite_count retweet_count
2 891815181378084864 2017-07-31 00:18:03+00:00 This is Archie. He is a rare Norwegian Pouncin... 12 10 Archie False False False False ... 0.716012 True malamute 0.078253 True kelpie 0.031379 True 24348 4009
4 891327558926688256 2017-07-29 16:00:24+00:00 This is Franklin. He would like you to stop ca... 12 10 Franklin False False False False ... 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True 39127 9050
6 890971913173991426 2017-07-28 16:27:12+00:00 Meet Jax. He enjoys ice cream so much he gets ... 13 10 Jax False False False False ... 0.341703 True Border_collie 0.199287 True ice_lolly 0.193548 False 11503 1985
8 890609185150312448 2017-07-27 16:25:51+00:00 This is Zoey. She doesn't want to be one of th... 13 10 Zoey False False False False ... 0.487574 True Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True 27059 4122
9 890240255349198849 2017-07-26 15:59:51+00:00 This is Cassie. She is a college pup. Studying... 14 10 Cassie True False False False ... 0.511319 True Cardigan 0.451038 True Chihuahua 0.029248 True 31004 7120
10 890006608113172480 2017-07-26 00:31:25+00:00 This is Koda. He is a South Australian decksha... 13 10 Koda False False False False ... 0.957979 True Pomeranian 0.013884 True chow 0.008167 True 29817 7078
11 889880896479866881 2017-07-25 16:11:53+00:00 This is Bruno. He is a service shark. Only get... 13 10 Bruno False False False False ... 0.377417 True Labrador_retriever 0.151317 True muzzle 0.082981 False 27032 4810
13 889638837579907072 2017-07-25 00:10:02+00:00 This is Ted. He does his best. Sometimes that'... 12 10 Ted False False False False ... 0.991650 True boxer 0.002129 True Staffordshire_bullterrier 0.001498 True 26357 4377
14 889531135344209921 2017-07-24 17:02:04+00:00 This is Stuart. He's sporting his favorite fan... 13 10 Stuart False False False True ... 0.953442 True Labrador_retriever 0.013834 True redbone 0.007958 True 14698 2173
15 889278841981685760 2017-07-24 00:19:32+00:00 This is Oliver. You're witnessing one of his m... 13 10 Oliver False False False False ... 0.626152 True borzoi 0.194742 True Saluki 0.027351 True 24543 5197
16 888917238123831296 2017-07-23 00:22:39+00:00 This is Jim. He found a fren. Taught him how t... 12 10 Jim False False False False ... 0.714719 True Tibetan_mastiff 0.120184 True Labrador_retriever 0.105506 True 28317 4348
17 888804989199671297 2017-07-22 16:56:37+00:00 This is Zeke. He has a new stick. Very proud o... 13 10 Zeke False False False False ... 0.469760 True Labrador_retriever 0.184172 True English_setter 0.073482 True 24836 4132
18 888554962724278272 2017-07-22 00:23:06+00:00 This is Ralphus. He's powering up. Attempting ... 13 10 Ralphus False False False False ... 0.700377 True Eskimo_dog 0.166511 True malamute 0.111411 True 19274 3415
20 888078434458587136 2017-07-20 16:49:33+00:00 This is Gerald. He was just told he didn't get... 12 10 Gerald False False False False ... 0.995026 True pug 0.000932 True bull_mastiff 0.000903 True 21153 3363
21 887705289381826560 2017-07-19 16:06:48+00:00 This is Jeffrey. He has a monopoly on the pool... 13 10 Jeffrey False False False False ... 0.821664 True redbone 0.087582 True Weimaraner 0.026236 True 29364 5193
23 887473957103951883 2017-07-19 00:47:34+00:00 This is Canela. She attempted some fancy porch... 13 10 Canela False False False False ... 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True 67002 17535
26 886983233522544640 2017-07-17 16:17:36+00:00 This is Maya. She's very shy. Rarely leaves he... 13 10 Maya False False False False ... 0.793469 True toy_terrier 0.143528 True can_opener 0.032253 False 34141 7500
27 886736880519319552 2017-07-16 23:58:41+00:00 This is Mingus. He's a wonderful father to his... 13 10 Mingus False False False False ... 0.309706 True Great_Pyrenees 0.186136 True Dandie_Dinmont 0.086346 True 11703 3152
29 886366144734445568 2017-07-15 23:25:31+00:00 This is Roscoe. Another pupper fallen victim t... 12 10 Roscoe False False True False ... 0.999201 True Chihuahua 0.000361 True Boston_bull 0.000076 True 20591 3090
31 886258384151887873 2017-07-15 16:17:19+00:00 This is Waffles. His doggles are pupside down.... 13 10 Waffles False False False False ... 0.943575 True shower_cap 0.025286 False Siamese_cat 0.002849 False 27219 6091
33 885984800019947520 2017-07-14 22:10:11+00:00 Viewer discretion advised. This is Jimbo. He w... 12 10 Jimbo False False False False ... 0.972494 True Shih-Tzu 0.006630 True Bernese_mountain_dog 0.006239 True 31687 6549
34 885528943205470208 2017-07-13 15:58:47+00:00 This is Maisey. She fell asleep mid-excavation... 13 10 Maisey False False False False ... 0.369275 True Labrador_retriever 0.265835 True kuvasz 0.134697 True 34974 6215
38 884925521741709313 2017-07-12 00:01:00+00:00 This is Earl. He found a hat. Nervous about wh... 12 10 Earl False False False False ... 0.259916 True American_Staffordshire_terrier 0.198451 True Staffordshire_bullterrier 0.127725 True 75166 17618
39 884876753390489601 2017-07-11 20:47:12+00:00 This is Lola. It's her first time outside. Mus... 13 10 Lola False False False False ... 0.822103 True Norwich_terrier 0.106075 True Norfolk_terrier 0.037348 True 27116 5427
40 884562892145688576 2017-07-11 00:00:02+00:00 This is Kevin. He's just so happy. 13/10 what ... 13 10 Kevin False False False False ... 0.546406 True French_bulldog 0.404291 True Brabancon_griffon 0.044002 True 23608 4526
43 884162670584377345 2017-07-09 21:29:42+00:00 Meet Yogi. He doesn't have any important dog m... 12 10 Yogi True False False False ... 0.707046 True malinois 0.199396 True Norwegian_elkhound 0.049148 True 19790 2882
44 883838122936631299 2017-07-09 00:00:04+00:00 This is Noah. He can't believe someone made th... 12 10 Noah False False False False ... 0.610946 True miniature_pinscher 0.299603 True kelpie 0.063020 True 21273 3331
45 883482846933004288 2017-07-08 00:28:19+00:00 This is Bella. She hopes her smile made you sm... 5 10 Bella False False False False ... 0.943082 True Labrador_retriever 0.032409 True kuvasz 0.005501 True 44625 9594
46 883360690899218434 2017-07-07 16:22:55+00:00 Meet Grizzwald. He may be the floofiest floofe... 13 10 Grizzwald False True False False ... 0.987997 True Tibetan_mastiff 0.007099 True Newfoundland 0.002140 True 22046 3586
48 882992080364220416 2017-07-06 15:58:11+00:00 This is Rusty. He wasn't ready for the first p... 13 10 Rusty False False False False ... 0.466778 True Siberian_husky 0.406044 True dingo 0.073414 False 23263 3795
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2283 667200525029539841 2015-11-19 04:39:35+00:00 This is Joshwa. He is a fuckboy supreme. He cl... 11 10 Joshwa False False False False ... 0.694904 True malamute 0.232006 True Eskimo_dog 0.050635 True 607 260
2286 667182792070062081 2015-11-19 03:29:07+00:00 This is Timison. He just told an awful joke bu... 10 10 Timison False False False False ... 0.949892 True Irish_setter 0.010564 True Chesapeake_Bay_retriever 0.005821 True 14245 6185
2287 667177989038297088 2015-11-19 03:10:02+00:00 This is a Dasani Kingfisher from Maine. His na... 8 10 a False False False False ... 0.259249 True Chesapeake_Bay_retriever 0.176293 True Weimaraner 0.112369 True 190 54
2289 667174963120574464 2015-11-19 02:58:01+00:00 This is Clarence. His face says he doesn't wan... 9 10 Clarence False False False False ... 0.266437 True Chihuahua 0.243223 True bluetick 0.072806 True 239 82
2290 667171260800061440 2015-11-19 02:43:18+00:00 Say hello to Kenneth. He likes Reese's Puffs. ... 10 10 Kenneth False False False False ... 0.841265 True Lakeland_terrier 0.052744 True Irish_water_spaniel 0.034402 True 219 88
2291 667165590075940865 2015-11-19 02:20:46+00:00 This is Churlie. AKA Fetty Woof. Lost eye savi... 10 10 Churlie False False False False ... 0.140173 True Rottweiler 0.134094 True beagle 0.081900 True 2669 1134
2292 667160273090932737 2015-11-19 01:59:39+00:00 This is Bradlay. He is a Ronaldinho Matsuyama ... 11 10 Bradlay False False False False ... 0.471351 True miniature_poodle 0.091992 True standard_poodle 0.087385 True 254 62
2293 667152164079423490 2015-11-19 01:27:25+00:00 This is Pipsy. He is a fluffball. Enjoys trave... 12 10 Pipsy False False False False ... 0.535411 True Pomeranian 0.087544 True miniature_poodle 0.062050 True 47349 16988
2295 667119796878725120 2015-11-18 23:18:48+00:00 This is Gabe. He is a southern Baklava. Gabe h... 10 10 Gabe False False False False ... 0.741563 True Chihuahua 0.057866 True toy_poodle 0.039125 True 327 127
2296 667090893657276420 2015-11-18 21:23:57+00:00 This is Clybe. He is an Anemone Valdez. One ea... 7 10 Clybe False False False False ... 0.959514 True Italian_greyhound 0.005370 True Pomeranian 0.002641 True 328 126
2297 667073648344346624 2015-11-18 20:15:26+00:00 Here is Dave. He is actually just a skinny leg... 10 10 Dave False False False False ... 0.483682 True pug 0.092494 True Brabancon_griffon 0.057495 True 398 122
2300 667062181243039745 2015-11-18 19:29:52+00:00 This is Keet. He is a Floridian Amukamara. Abs... 10 10 Keet False False False False ... 0.825678 True vizsla 0.090998 True kelpie 0.022956 True 215 54
2308 666817836334096384 2015-11-18 03:18:55+00:00 This is Jeph. He is a German Boston Shuttlecoc... 9 10 Jeph False False False False ... 0.496953 True standard_schnauzer 0.285276 True giant_schnauzer 0.073764 True 512 248
2309 666804364988780544 2015-11-18 02:25:23+00:00 This is Jockson. He is a Pinnacle Sagittarius.... 8 10 Jockson False False False False ... 0.328792 True Brittany_spaniel 0.283545 True Ibizan_hound 0.057461 True 236 92
2311 666781792255496192 2015-11-18 00:55:42+00:00 This is a purebred Bacardi named Octaviath. Ca... 10 10 a False False False False ... 0.618316 True Weimaraner 0.151363 True vizsla 0.085989 True 380 189
2313 666739327293083650 2015-11-17 22:06:57+00:00 This is Lugan. He is a Bohemian Rhapsody. Very... 10 10 Lugan False False False False ... 0.546933 True cocker_spaniel 0.165255 True toy_poodle 0.095959 True 232 65
2314 666701168228331520 2015-11-17 19:35:19+00:00 This is a golden Buckminsterfullerene named Jo... 8 10 a False False False False ... 0.887707 True Chihuahua 0.029307 True French_bulldog 0.020756 True 425 216
2315 666691418707132416 2015-11-17 18:56:35+00:00 This is Christoper. He is a spotted Penne. Can... 8 10 Christoper False False False False ... 0.975401 True beagle 0.008687 True bloodhound 0.005394 True 186 47
2317 666644823164719104 2015-11-17 15:51:26+00:00 This is Jimothy. He is a Botwanian Gouda. Can ... 9 10 Jimothy False False False False ... 0.044333 True Pembroke 0.043209 True West_Highland_white_terrier 0.038906 True 228 79
2318 666454714377183233 2015-11-17 03:16:00+00:00 I'll name the dogs from now on. This is Kreggo... 10 10 Kreggory False False False False ... 0.278954 True Labrador_retriever 0.237612 True Great_Pyrenees 0.171106 True 510 207
2319 666447344410484738 2015-11-17 02:46:43+00:00 This is Scout. She is a black Downton Abbey. I... 9 10 Scout False False False False ... 0.322084 True giant_schnauzer 0.287955 True Labrador_retriever 0.166331 True 101 19
2325 666418789513326592 2015-11-17 00:53:15+00:00 This is Walter. He is an Alaskan Terrapin. Lov... 10 10 Walter False False False False ... 0.149680 True papillon 0.148258 True Chihuahua 0.142860 True 121 45
2327 666407126856765440 2015-11-17 00:06:54+00:00 This is a southern Vesuvius bumblegruff. Can d... 7 10 a False False False False ... 0.529139 True bloodhound 0.244220 True flat-coated_retriever 0.173810 True 105 38
2345 666063827256086533 2015-11-16 01:22:45+00:00 This is the happiest dog you will ever see. Ve... 10 10 the False False False False ... 0.775930 True Tibetan_mastiff 0.093718 True Labrador_retriever 0.072427 True 461 212
2346 666058600524156928 2015-11-16 01:01:59+00:00 Here is the Rand Paul of retrievers folks! He'... 8 10 the False False False False ... 0.201493 True komondor 0.192305 True soft-coated_wheaten_terrier 0.082086 True 109 57
2348 666055525042405380 2015-11-16 00:49:46+00:00 Here is a Siberian heavily armored polar bear ... 10 10 a False False False False ... 0.692517 True Tibetan_mastiff 0.058279 True fur_coat 0.054449 False 426 235
2350 666050758794694657 2015-11-16 00:30:50+00:00 This is a truly beautiful English Wilson Staff... 10 10 a False False False False ... 0.651137 True English_springer 0.263788 True Greater_Swiss_Mountain_dog 0.016199 True 130 57
2352 666044226329800704 2015-11-16 00:04:52+00:00 This is a purebred Piers Morgan. Loves to Netf... 6 10 a False False False False ... 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True 289 136
2353 666033412701032449 2015-11-15 23:21:54+00:00 Here is a very happy pup. Big fan of well-main... 9 10 a False False False False ... 0.596461 True malinois 0.138584 True bloodhound 0.116197 True 121 43
2354 666029285002620928 2015-11-15 23:05:30+00:00 This is a western brown Mitsubishi terrier. Up... 7 10 a False False False False ... 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True 125 46

1105 rows × 23 columns

In [47]:
joined_df.columns
Out[47]:
Index(['tweet_id', 'timestamp', 'text', 'rating_numerator',
       'rating_denominator', 'name', 'doggo', 'floofer', 'pupper', 'puppo',
       'jpg_url', 'img_num', 'p1', 'p1_conf', 'p1_dog', 'p2', 'p2_conf',
       'p2_dog', 'p3', 'p3_conf', 'p3_dog', 'favorite_count', 'retweet_count'],
      dtype='object')

Store cleaned and tidies data

In [48]:
joined_df.to_csv('../data/twitter_archive_master.csv', index=False)
In [49]:
conn = sqlite3.connect("../data/twitter_archive_master.db")
joined_df.to_sql("twitter_archive_master", conn, if_exists="replace")
In [50]:
# test
df = pd.read_sql_query("select * from twitter_archive_master;", conn)
df.head()
Out[50]:
index tweet_id timestamp text rating_numerator rating_denominator name doggo floofer pupper ... p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog favorite_count retweet_count
0 2 891815181378084864 2017-07-31 00:18:03+00:00 This is Archie. He is a rare Norwegian Pouncin... 12 10 Archie 0 0 0 ... 0.716012 1 malamute 0.078253 1 kelpie 0.031379 1 24348 4009
1 4 891327558926688256 2017-07-29 16:00:24+00:00 This is Franklin. He would like you to stop ca... 12 10 Franklin 0 0 0 ... 0.555712 1 English_springer 0.225770 1 German_short-haired_pointer 0.175219 1 39127 9050
2 6 890971913173991426 2017-07-28 16:27:12+00:00 Meet Jax. He enjoys ice cream so much he gets ... 13 10 Jax 0 0 0 ... 0.341703 1 Border_collie 0.199287 1 ice_lolly 0.193548 0 11503 1985
3 8 890609185150312448 2017-07-27 16:25:51+00:00 This is Zoey. She doesn't want to be one of th... 13 10 Zoey 0 0 0 ... 0.487574 1 Irish_setter 0.193054 1 Chesapeake_Bay_retriever 0.118184 1 27059 4122
4 9 890240255349198849 2017-07-26 15:59:51+00:00 This is Cassie. She is a college pup. Studying... 14 10 Cassie 1 0 0 ... 0.511319 1 Cardigan 0.451038 1 Chihuahua 0.029248 1 31004 7120

5 rows × 24 columns

Analyze and Visualize

Most rated value

the most rated values and the highest rated value:

In [51]:
df['rating_numerator'].value_counts()
Out[51]:
12    290
11    250
10    227
13    153
9      82
8      48
7      21
14     13
6       9
5       4
3       3
4       2
75      1
27      1
2       1
Name: rating_numerator, dtype: int64

Best rated dogs

SPOTTED: There is a error in the rate extraction from text, because in the tweet the dog has a rate of 9.75 not 75.

Also the rating 27 is meant as a 11.27 ratings. So they weren't extracted correctly. Now they must be cleaned as well.

In [52]:
#df[df['rating_numerator'] == 27] # test
rows = df.shape[0]
df = df[df['rating_numerator'] < 27]
print("Dropped ", rows-df.shape[0], ' after removing wrong computed ratings.')
Dropped  2  after removing wrong computed ratings.

The maximum value that was given as rating:

In [53]:
df['rating_numerator'].max()
Out[53]:
14

The amount of tweets (dogs) that received the maximum rating:

In [137]:
df_best_rated = df[df['rating_numerator'] == df['rating_numerator'].max()]
df_best_rated = df_best_rated.reset_index(drop=True)
df_best_rated.shape
Out[137]:
(13, 24)

Those are the dogs that received the maxium rating 14/10:

In [138]:
for dog in range(df_best_rated.shape[0]):
    dog_url = df_best_rated['jpg_url'][dog]
    print('Dog name: ', df_best_rated['name'][dog])
    print('Dog breed: ', df_best_rated['p1'][dog])
    print('prediction rate: ', df_best_rated['p1_conf'][dog])
    print('rate: ', df_best_rated['rating_numerator'][dog], '/ 10')
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/best_rated_dog_"+str(dog)+".jpg")
    
Dog name:  Cassie
Dog breed:  Pembroke
prediction rate:  0.511319
rate:  14 / 10
Dog name:  a
Dog breed:  Samoyed
prediction rate:  0.281463
rate:  14 / 10
Dog name:  Emmy
Dog breed:  French_bulldog
prediction rate:  0.839097
rate:  14 / 10
Dog name:  Cermet
Dog breed:  Chihuahua
prediction rate:  0.8765430000000001
rate:  14 / 10
Dog name:  Smiley
Dog breed:  Pembroke
prediction rate:  0.134081
rate:  14 / 10
Dog name:  Kuyu
Dog breed:  bloodhound
prediction rate:  0.777562
rate:  14 / 10
Dog name:  one
Dog breed:  golden_retriever
prediction rate:  0.649209
rate:  14 / 10
Dog name:  Doobert
Dog breed:  Bedlington_terrier
prediction rate:  0.392535
rate:  14 / 10
Dog name:  Gabe
Dog breed:  Pomeranian
prediction rate:  0.960199
rate:  14 / 10
Dog name:  Sundance
Dog breed:  Irish_setter
prediction rate:  0.5054960000000001
rate:  14 / 10
Dog name:  Bo
Dog breed:  standard_poodle
prediction rate:  0.351308
rate:  14 / 10
Dog name:  Gary
Dog breed:  French_bulldog
prediction rate:  0.709146
rate:  14 / 10
Dog name:  Ollie
Dog breed:  golden_retriever
prediction rate:  0.873233
rate:  14 / 10

Best predicted dog breed

The dog breed that is predicted the easiest and its amount in the dataset:

In [61]:
df_stat = pd.DataFrame(df.groupby('p1')['p1_conf'].mean())
df_stat['count'] = df.groupby('p1')['p1_conf'].count()
In [62]:
# the 10 best predicted dog breeds
df_stat.sort_values('p1_conf',ascending=False)[:10]
Out[62]:
p1_conf count
p1
komondor 0.972531 3
Tibetan_mastiff 0.936126 2
Brittany_spaniel 0.898093 6
keeshond 0.844431 4
bull_mastiff 0.833571 4
French_bulldog 0.803971 21
Bernese_mountain_dog 0.801816 10
Samoyed 0.790475 24
Pomeranian 0.767541 27
Leonberg 0.766436 2
In [139]:
df_best_predicted = df[df['p1'] == 'komondor']
df_best_predicted = df_best_predicted.reset_index(drop=True)

Those are the dogs that belong to the best predicted breed:

In [140]:
for dog in range(df_best_predicted.shape[0]):
    dog_url = df_best_predicted['jpg_url'][dog]
    print('Dog name: ', df_best_predicted['name'][dog])
    print('Dog breed: ', df_best_predicted['p1'][dog])
    print('prediction: ', df_best_predicted['p1_conf'][dog])
    print('rate: ', df_best_predicted['rating_numerator'][dog], '/ 10')
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/best_predicted_breed_"+str(dog)+".jpg")
Dog name:  Napolean
Dog breed:  komondor
prediction:  0.974781
rate:  12 / 10
Dog name:  Remus
Dog breed:  komondor
prediction:  0.942856
rate:  11 / 10
Dog name:  an
Dog breed:  komondor
prediction:  0.999956
rate:  10 / 10

And here is the dog whose breed was best predicted: (also from the same breed as the best predicted breed)

In [172]:
# best predicted dog 
df_best_predicted = df[df['p1_conf'] == df['p1_conf'].max()]
df_best_predicted = df_best_predicted.reset_index(drop=True)
df_best_predicted.shape
Out[172]:
(1, 24)
In [197]:
for dog in range(df_best_predicted.shape[0]):
    dog_url = df_best_predicted['jpg_url'][dog]
    print('Dog name: ', df_best_predicted['name'][dog])
    print('Dog breed: ', df_best_predicted['p1'][dog])
    print('prediction: ', df_best_predicted['p1_conf'][dog])
    print('rate: ', df_best_predicted['rating_numerator'][dog], '/ 10')
    print('retweet: ', df_best_predicted['retweet_count'][dog])
    print('favorite: ', df_best_predicted['favorite_count'][dog])
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/best_predicted_breed_"+str(dog)+".jpg")
Dog name:  an
Dog breed:  komondor
prediction:  0.999956
rate:  10 / 10
retweet:  496
favorite:  1040

Most rated dog breed

The 10 most rated dog breeds:

In [65]:
df_stat.sort_values('count',ascending=False)[:10]
Out[65]:
p1_conf count
p1
golden_retriever 0.730279 95
Pembroke 0.715845 68
Labrador_retriever 0.645026 65
Chihuahua 0.592743 60
pug 0.745156 42
chow 0.631058 34
toy_poodle 0.588837 32
Pomeranian 0.767541 27
Samoyed 0.790475 24
malamute 0.570732 23
In [141]:
df_most_rated = df[df['p1'] == 'golden_retriever']
df_most_rated = df_most_rated.reset_index(drop=True)
df_most_rated.shape
Out[141]:
(95, 24)

Let's look at 10 dogs of the most rated breed in the dataset-the golden retriever:

In [142]:
# show only 10 of 95 dogs
for dog in range(10):
    dog_url = df_most_rated['jpg_url'][dog]
    print('Dog name: ', df_most_rated['name'][dog])
    print('Dog breed: ', df_most_rated['p1'][dog])
    print('prediction rate: ', df_most_rated['p1_conf'][dog])
    print('rate: ', df_most_rated['rating_numerator'][dog], '/ 10')
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/most_rated_breed"+str(dog)+".jpg")
Dog name:  Stuart
Dog breed:  golden_retriever
prediction rate:  0.953442
rate:  13 / 10
Dog name:  Jim
Dog breed:  golden_retriever
prediction rate:  0.714719
rate:  12 / 10
Dog name:  Zeke
Dog breed:  golden_retriever
prediction rate:  0.46976
rate:  13 / 10
Dog name:  Bella
Dog breed:  golden_retriever
prediction rate:  0.943082
rate:  5 / 10
Dog name:  Alfy
Dog breed:  golden_retriever
prediction rate:  0.762211
rate:  13 / 10
Dog name:  Bella
Dog breed:  golden_retriever
prediction rate:  0.913255
rate:  12 / 10
Dog name:  Benedict
Dog breed:  golden_retriever
prediction rate:  0.874566
rate:  13 / 10
Dog name:  Zoey
Dog breed:  golden_retriever
prediction rate:  0.841001
rate:  13 / 10
Dog name:  Boomer
Dog breed:  golden_retriever
prediction rate:  0.673664
rate:  13 / 10
Dog name:  Paisley
Dog breed:  golden_retriever
prediction rate:  0.945905
rate:  13 / 10

most favored dog

The most favored dog from twitter favorite counts:

In [164]:
df_most_favorited = df[df['favorite_count'] == df['favorite_count'].max()]
df_most_favorited = df_most_favorited.reset_index(drop=True)
df_most_favorited.shape
Out[164]:
(1, 24)
In [200]:
for dog in range(df_most_favorited.shape[0]):
    dog_url = df_most_favorited['jpg_url'][dog]
    print('Dog name: ', df_most_favorited['name'][dog])
    print('Dog breed: ', df_most_favorited['p1'][dog])
    print('favorited: ', df_most_favorited['favorite_count'][dog])
    print('prediction rate: ', df_most_rated['p1_conf'][dog])
    print('rate: ', df_most_favorited['rating_numerator'][dog], '/ 10')
    print('retweet: ', df_most_favorited['retweet_count'][dog])
    print('favorite: ', df_most_favorited['favorite_count'][dog])
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/df_most_favorited_"+str(dog)+".jpg")
Dog name:  Stephan
Dog breed:  Chihuahua
favorited:  125519
prediction rate:  0.953442
rate:  13 / 10
retweet:  60236
favorite:  125519

How close are the other dogs with their favorite counts? Therefore we look at the histogram of the favorite counts:

In [194]:
df['favorite_count'].hist(bins=100)
Out[194]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a195977f0>

most retweeted dog

Who is the most retweeded dog?

In [167]:
df_most_retweeted = df[df['retweet_count'] == df['retweet_count'].max()]
df_most_retweeted = df_most_retweeted.reset_index(drop=True)
df_most_retweeted.shape
Out[167]:
(1, 24)
In [168]:
for dog in range(df_most_retweeted.shape[0]):
    dog_url = df_most_retweeted['jpg_url'][dog]
    print('Dog name: ', df_most_retweeted['name'][dog])
    print('Dog breed: ', df_most_retweeted['p1'][dog])
    print('favorited: ', df_most_retweeted['favorite_count'][dog])
    print('rate: ', df_most_retweeted['rating_numerator'][dog], '/ 10')
    display(Image(dog_url, width=400, unconfined=True))
    urllib.request.urlretrieve(dog_url, "images/df_most_retweeted_"+str(dog)+".jpg")
Dog name:  Stephan
Dog breed:  Chihuahua
favorited:  125519
rate:  13 / 10

Visualization

In [241]:
sns.set_style("whitegrid")
sns.set(rc={'figure.figsize':(20,12)})
sns.set(font_scale=2.0)
sns.set_palette("pastel")

Used dog vocabularies

In [242]:
df_words = df[['puppo', 'doggo', 'floofer', 'pupper']]
df_words.apply(pd.value_counts).plot(kind='bar', 
                                     title='all types')
fig = ax.get_figure()
fig.savefig("plots/dog_vocabularies.png")

Ratings over time (colored by retweet count)

In [243]:
df['timestamp'] = pd.to_datetime(df['timestamp'])
In [244]:
ax = sns.scatterplot(x="timestamp", y="rating_numerator",
                      hue="retweet_count", size="retweet_count",alpha=.8,
                      data=df)

ax.set_xlim(df['timestamp'].min(), df['timestamp'].max())
fig = ax.get_figure()
fig.savefig("plots/retweet_rating_over_time_b.png")

The ratings have changed over time. The ratings have increased.

Ratings over time (colored by favorite count)

In [245]:
ax = sns.scatterplot(x="timestamp", y="rating_numerator",
                      hue="favorite_count", size="favorite_count",alpha=.8,
                      data=df)

ax.set_xlim(df['timestamp'].min(), df['timestamp'].max())
fig = ax.get_figure()
fig.savefig("plots/favorit_rating_over_time_b.png")

If we look at the favorite count and the retweet count in the diagrams above, then we can see, that they might be similar.

Retweet and Favorite correlation

In [246]:
ax = sns.scatterplot(x="favorite_count", y="retweet_count",hue="rating_numerator", 
                     alpha=.8,
                    data=df)
fig = ax.get_figure()
fig.savefig("plots/retweet_favorit_scatter_b.png")

The retweet and favorite count correlate.

Retweet count over Rating (colored by favorite count)

In [247]:
ax = sns.scatterplot(x="rating_numerator", y="retweet_count",hue="favorite_count", 
                     alpha=.8,
                    data=df)
fig = ax.get_figure()
fig.savefig("plots/retweet_rating_scatter_b.png")

The low rating values are not retweeted well. The higher rating values (x-axis) are retweeted more often (y-axis). And also favored more often (color).

In [235]:
ax = sns.scatterplot(x="favorite_count", y="p1_conf", 
                     alpha=.8,
                    data=df)
fig = ax.get_figure()
fig.savefig("plots/favorite_p1conf_scatter.png")

There is no correlatin between the prediction confidence of the neural network and the favorite count.